![]() |
The Twofish Encryption Algorithm: A 128-Bit Block Cipher
by Bruce Schneier ; John Kelsey ; Doug Whiting ; David Wagner ; Chris Hall ; Niels Ferguson Wiley Computer Publishing, John Wiley & Sons, Inc. ISBN: 0471353817 Pub Date: 03/01/99 |
Previous | Table of Contents | Next |
We have learned three important lessons from this analysis:
Higher-order differential cryptanalysis looks at higher order relations (e.g., quadratic) between pairs of plaintext and ciphertexts [Lai94, Knu95b]. These attacks seem to work best against algorithms with simple algebraic functions, only a few rounds, and poor short-term diffusion. In particular, we are not aware of any higher-order differential attack reported in the open literature that is successful against more than six rounds of the target cipher. We cannot find any higher-order differentials that can be exploited in the cryptanalysis of Twofish.
Attacks using truncated differentials apply a differential attack to only a partial block [Knu95b]. We have not found any truncated attacks against Twofish. The almost complete diffusion within a round function makes it very difficult to isolate a portion of the block and ignore the rest of the block. Truncated differential attacks are most often successful when most of the ciphers internal components operate on narrow bit paths; Twofishs 64-bit-wide F function seems to make truncated differential characteristics hard to find. We believe that Twofish is secure against truncated differential attacks. For example, our differential attack, described above, is properly classified as a truncated differential attack, and it can successfully attack only five rounds of the full Twofish.
One aspect of our analysis work is a search for an upper bound on the probability of a differential characteristic. This section contains our results to date. We expect to continue this work and achieve significant improvements over our current results.
The first choice we have to make in differential cryptanalysis is what type of differences to use. Twofish contains S-boxes, an MDS matrix multiply, addition modulo 232, XORs, and rotations. There are two types of differences that we think could be useful: a XOR difference, and a difference mod 232. When we use a XOR difference we have to use approximations for the S-boxes and the additions modulo 232; when we use a differences modulo 232 we have to use approximations for the S-boxes, the MDS matrix multiply, the XORs, and the rotations.
The XOR and addition operations are fairly closely related, and either operation can be approximated with reasonable success in the group of the other operation. In the comparison we will ignore the S-boxes; we assume they are equally hard to approximate in each of the groups. An XOR differential has to approximate two addition operations in each round. An additive differential has to approximate the MDS matrix, a single XOR, and a rotation. We estimate that it is about as difficult to approximate an addition for an XOR differential as it is to approximate an XOR for an additive differential. The rotations are relatively simple to approximate in both cases, so we ignore them for this comparison. Thus, if we ignore the MDS matrix, it would seem that additive differentials are more attractive.
For our analysis, the MDS matrix multiply is best written as a linear function: each output bit is the XOR of several input bits. This is very easy from the point of view of an XOR difference; no approximation is necessary and any given input difference leads to precisely one output difference. For an additive difference this is much harder. There do not seem to be any good approximations of the MDS matrix for an additive difference. Therefore, we estimate that XOR-based differentials are much more effective than additive differentials. In the rest of this paper we will look only at XOR-based differentials.
We use the following notations: Let β be the set of all possible byte values. Let G be the F-function without the key-dependent S-boxes. Thus G consists of two MDS matrix multiplies, the PHT, and the subkey addition.
In this section we look at differential characteristics of the S-boxes. Each S-box consists of a sequence of q-mappings and XORs with a key byte. For q0 and q1 the probability of each differential can easily be computed by trying all possible pairs of inputs.
We define pi (a, b) to be the probability that qi has an output difference of b, given an input difference of a. In other words,
We now look at the first two stages of an S-box. This consists of a q-mapping, followed by an XOR with a key byte, followed by another q-mapping. As usual, we assume uniform random distributions of the input values and the key bytes. We define pij (a, b) to be the probability that this construction gives an output difference b given an input difference a, where i is the number of the first q-mapping, and j is the number of the second q-mapping. It is easy to see that
for i, j ∈ {0, 1}, and we can extend this definition to arbitrary long chains of q-mappings and key-byte XORs. In general, it holds that
S-box | 128-bit Key | 192-bit Key | 256-bit Key |
---|
0 | 1.0649 · 2-8 | 1.0084 · 2-8 | 1.0043 · 2-8 |
1 | 1.0566 · 2-8 | 1.0087 · 2-8 | 1.0043 · 2-8 |
2 | 1.0533 · 2-8 | 1.0097 · 2-8 | 1.0045 · 2-8 |
3 | 1.0538 · 2-8 | 1.0088 · 2-8 | 1.0044 · 2-8 |
Table 9.7. Best Differential Probabilities of the S-boxes under Random Key
for i, j,..., m ∈ {0, 1}. This allows us to compute the exact probabilities for each of the S-boxes in Twofish. Table 9.7 gives the probabilities of the best differential of each of the S-boxes for each key length. From this point of view the S-boxes are very good; there are no high-probability differentials. (Note that the average differential probability is 1/255 = 1.0039 .2-8, as we know that the S-boxes are permutations, and thus the output differential 0 does not occur in non-trivial cases. The best differential probability must be at least as large as the average.)
Note that the numbers in this table hold only when the key bytes are chosen at random. In our analysis we assumed that the probabilities in each q-stage were independent. If we try a differential many times, each time with random input and key byte values, then we expect to get the numbers in the table. However, for any particular set of key bytes there are differentials with a much higher probability (as shown in section 7.2.3). Our computations are no longer valid, because for any fixed key byte, the differential probabilities of pi and pj in equation 9.1 are no longer independent of each other.
Twofish uses the same S-boxes in each round. When analyzing a multi-round differential characteristic, the differential probabilities of each of the round functions are not independent either. This makes the analysis of the probability of a differential characteristic more difficult.
The function F takes a 64-bit input and produces a 64-bit output. Thus there are a total of about 2128 possible differentials. It is clearly not possible to compute or list all of them. To alleviate this problem we will group the differentials in sets, and for every set compute upper bounds on the probability of the differentials in that set.
We split the 2128 different differential patterns into a number of subsets. The input difference is classified by the set of input bytes that are nonzero. There are 256 different classifications of input differences. The output difference too is classified by the set of output bytes that are non-zero. We group differentials with the same input and output classification in the same set. There are therefore 216 different sets of differentials, each containing between 1 and 25516 elements.
We will construct differentials of F in two steps. First we use a differential approximation of the S-boxes, and then we use an approximation of the differentials of G.
Differentials of G. The MDS matrix multiply is purely linear, and thus creates no problem for our differential. The PHT and key addition use addition modulo 232 as basic operation. This makes the differentials non-trivial. A theoretical analysis of differential probabilities is difficult, as the probabilities at the result are not independent of each other. We therefore chose to use numerical simulation to establish bounds on the differential probability.
We are trying to derive an upper bound on differential probabilities. Therefore, we are interested in finding good bounds for the most likely differentials of G. We know that for any 128-bit key, the best differential probability of an S-box is 18/256. If we look only at the S-boxes, then the most likely differentials occur when there are a low number of active S-boxes. The most important task is thus to find good bounds on the differential characteristics of G for differentials with a low number of active S-boxes.
We performed numerical simulations of differentials of G. Given an input difference, we generated n random input pairs with that difference and applied the G function (using random keys). We collected the output differences and counted how often each of them appeared. Due to limited resources we could do this analysis only for moderately large n.
From this data we would like to derive a bound on the differential probability. Let us assume a specific differential occurs k times out of n tries. It is obviously not a good idea to use k/n as a bound on the differential probability. Most possible differences occur zero times, but we should not assume that they have a zero probability. If we knew the distribution of the probability we could give some meaningful bound; for example, saying that the probability is less than x with a confidence level of 1 percent. However, in our case we do not know the distribution of the differential probabilities, and it would be dangerous to assume one. We can, however, reverse the process.
Previous | Table of Contents | Next |